Overview

Dataset statistics

Number of variables13
Number of observations800
Missing cells386
Missing cells (%)3.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory194.7 KiB
Average record size in memory249.3 B

Variable types

NUM9
CAT3
BOOL1

Reproduction

Analysis started2020-03-28 01:41:01.196124
Analysis finished2020-03-28 01:41:17.918743
Versionpandas-profiling v2.5.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
Name has a high cardinality: 800 distinct values High cardinality
Generation is highly correlated with #High Correlation
# is highly correlated with GenerationHigh Correlation
Type 2 has 386 (48.3%) missing values Missing

Variables

#
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM
Distinct count721
Unique (%)90.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean362.81375
Minimum1
Maximum721
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum1
5-th percentile34.95
Q1184.75
median364.5
Q3539.25
95-th percentile689.05
Maximum721
Range720
Interquartile range (IQR)354.5

Descriptive statistics

Standard deviation208.3437976
Coefficient of variation (CV)0.574244492
Kurtosis-1.165705095
Mean362.81375
Median Absolute Deviation (MAD)179.1546812
Skewness-0.001122502762
Sum290251
Variance43407.13798
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 721.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
479 6 0.8%
 
386 4 0.5%
 
711 4 0.5%
 
710 4 0.5%
 
150 3 0.4%
 
6 3 0.4%
 
413 3 0.4%
 
646 3 0.4%
 
303 2 0.2%
 
302 2 0.2%
 
Other values (711) 766 95.8%
 
ValueCountFrequency (%) 
1 1 0.1%
 
2 1 0.1%
 
3 2 0.2%
 
4 1 0.1%
 
5 1 0.1%
 
ValueCountFrequency (%) 
721 1 0.1%
 
720 2 0.2%
 
719 2 0.2%
 
718 1 0.1%
 
717 1 0.1%
 

Name
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE
Distinct count800
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
LandorusTherian Forme
 
1
Totodile
 
1
Golett
 
1
Heatran
 
1
Frillish
 
1
Other values (795)
795
ValueCountFrequency (%) 
LandorusTherian Forme 1 0.1%
 
Totodile 1 0.1%
 
Golett 1 0.1%
 
Heatran 1 0.1%
 
Frillish 1 0.1%
 
Staryu 1 0.1%
 
Mismagius 1 0.1%
 
Gothitelle 1 0.1%
 
LatiasMega Latias 1 0.1%
 
WormadamTrash Cloak 1 0.1%
 
Other values (790) 790 98.8%
 

Length

Max length25
Mean length8.84125
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 27 42.9%
 
Uppercase_Letter 26 41.3%
 
Decimal_Number 3 4.8%
 
Other_Punctuation 3 4.8%
 
Other_Symbol 2 3.2%
 
Space_Separator 1 1.6%
 
Dash_Punctuation 1 1.6%
 
ValueCountFrequency (%) 
Latin 53 84.1%
 
Common 10 15.9%
 
ValueCountFrequency (%) 
ASCII 60 96.8%
 
Misc Symbols 2 3.2%
 

Type 1
Categorical

Distinct count18
Unique (%)2.2%
Missing0
Missing (%)0.0%
Memory size6.4 KiB
Water
112
Normal
98
Grass
 
70
Bug
 
69
Psychic
 
57
Other values (13)
394
ValueCountFrequency (%) 
Water 112 14.0%
 
Normal 98 12.2%
 
Grass 70 8.8%
 
Bug 69 8.6%
 
Psychic 57 7.1%
 
Fire 52 6.5%
 
Rock 44 5.5%
 
Electric 44 5.5%
 
Ghost 32 4.0%
 
Dragon 32 4.0%
 
Other values (8) 190 23.8%
 

Length

Max length8
Mean length5.26
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 17 60.7%
 
Uppercase_Letter 11 39.3%
 
ValueCountFrequency (%) 
Latin 28 100.0%
 
ValueCountFrequency (%) 
ASCII 28 100.0%
 

Type 2
Categorical

MISSING
Distinct count18
Unique (%)4.3%
Missing386
Missing (%)48.3%
Memory size6.4 KiB
Flying
97
Ground
 
35
Poison
 
34
Psychic
 
33
Fighting
 
26
Other values (13)
189
ValueCountFrequency (%) 
Flying 97 12.1%
 
Ground 35 4.4%
 
Poison 34 4.2%
 
Psychic 33 4.1%
 
Fighting 26 3.2%
 
Grass 25 3.1%
 
Fairy 23 2.9%
 
Steel 22 2.8%
 
Dark 20 2.5%
 
Dragon 18 2.2%
 
Other values (8) 81 10.1%
 
(Missing) 386 48.2%
 

Length

Max length8
Mean length4.3725
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 17 60.7%
 
Uppercase_Letter 11 39.3%
 
ValueCountFrequency (%) 
Latin 28 100.0%
 
ValueCountFrequency (%) 
ASCII 28 100.0%
 

Total
Real number (ℝ≥0)

Distinct count200
Unique (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean435.1025
Minimum180
Maximum780
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum180
5-th percentile250
Q1330
median450
Q3515
95-th percentile630
Maximum780
Range600
Interquartile range (IQR)185

Descriptive statistics

Standard deviation119.9630398
Coefficient of variation (CV)0.2757121362
Kurtosis-0.5074607103
Mean435.1025
Median Absolute Deviation (MAD)100.2656438
Skewness0.1525299234
Sum348082
Variance14391.13091
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[180. 244.5 299.5 301. 304.5 ... 585. 597. 605. 710. 780. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
600 37 4.6%
 
405 26 3.2%
 
500 23 2.9%
 
580 23 2.9%
 
300 19 2.4%
 
490 18 2.2%
 
525 16 2.0%
 
480 15 1.9%
 
495 15 1.9%
 
330 15 1.9%
 
Other values (190) 593 74.1%
 
ValueCountFrequency (%) 
180 1 0.1%
 
190 1 0.1%
 
194 1 0.1%
 
195 3 0.4%
 
198 1 0.1%
 
ValueCountFrequency (%) 
780 3 0.4%
 
770 2 0.2%
 
720 1 0.1%
 
700 9 1.1%
 
680 13 1.6%
 

HP
Real number (ℝ≥0)

Distinct count94
Unique (%)11.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.25875
Minimum1
Maximum255
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum1
5-th percentile35.95
Q150
median65
Q380
95-th percentile110
Maximum255
Range254
Interquartile range (IQR)30

Descriptive statistics

Standard deviation25.53466903
Coefficient of variation (CV)0.368685098
Kurtosis7.232078374
Mean69.25875
Median Absolute Deviation (MAD)18.84048125
Skewness1.568224376
Sum55407
Variance652.0193226
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 29. 39.5 40.5 44.5 ... 100.5 104.5 110.5 155. 255. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
60 67 8.4%
 
50 63 7.9%
 
70 57 7.1%
 
65 46 5.8%
 
75 43 5.4%
 
80 43 5.4%
 
40 38 4.8%
 
45 38 4.8%
 
55 37 4.6%
 
100 32 4.0%
 
Other values (84) 336 42.0%
 
ValueCountFrequency (%) 
1 1 0.1%
 
10 1 0.1%
 
20 6 0.8%
 
25 2 0.2%
 
28 1 0.1%
 
ValueCountFrequency (%) 
255 1 0.1%
 
250 1 0.1%
 
190 1 0.1%
 
170 1 0.1%
 
165 1 0.1%
 

Attack
Real number (ℝ≥0)

Distinct count111
Unique (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.00125
Minimum5
Maximum190
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum5
5-th percentile30
Q155
median75
Q3100
95-th percentile136.2
Maximum190
Range185
Interquartile range (IQR)45

Descriptive statistics

Standard deviation32.45736587
Coefficient of variation (CV)0.4108462318
Kurtosis0.1697173149
Mean79.00125
Median Absolute Deviation (MAD)25.82881875
Skewness0.551613748
Sum63201
Variance1053.480599
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 29.5 31.5 34. 35.5 ... 120.5 129.5 130.5 152.5 190. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
100 40 5.0%
 
65 39 4.9%
 
80 37 4.6%
 
50 37 4.6%
 
85 33 4.1%
 
60 33 4.1%
 
75 32 4.0%
 
70 31 3.9%
 
90 30 3.8%
 
55 30 3.8%
 
Other values (101) 458 57.2%
 
ValueCountFrequency (%) 
5 2 0.2%
 
10 3 0.4%
 
15 1 0.1%
 
20 8 1.0%
 
22 1 0.1%
 
ValueCountFrequency (%) 
190 1 0.1%
 
185 1 0.1%
 
180 3 0.4%
 
170 2 0.2%
 
165 3 0.4%
 

Defense
Real number (ℝ≥0)

Distinct count103
Unique (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.8425
Minimum5
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum5
5-th percentile35
Q150
median70
Q390
95-th percentile130
Maximum230
Range225
Interquartile range (IQR)40

Descriptive statistics

Standard deviation31.18350056
Coefficient of variation (CV)0.422297465
Kurtosis2.72626036
Mean73.8425
Median Absolute Deviation (MAD)23.8936
Skewness1.155912303
Sum59074
Variance972.4107071
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 29. 34.5 36. 39.5 ... 120.5 129.5 130.5 155. 230. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
70 54 6.8%
 
50 49 6.1%
 
60 46 5.8%
 
80 39 4.9%
 
40 36 4.5%
 
65 36 4.5%
 
90 35 4.4%
 
100 33 4.1%
 
55 32 4.0%
 
45 32 4.0%
 
Other values (93) 408 51.0%
 
ValueCountFrequency (%) 
5 2 0.2%
 
10 1 0.1%
 
15 4 0.5%
 
20 4 0.5%
 
23 1 0.1%
 
ValueCountFrequency (%) 
230 3 0.4%
 
200 2 0.2%
 
184 1 0.1%
 
180 3 0.4%
 
168 1 0.1%
 

Sp. Atk
Real number (ℝ≥0)

Distinct count105
Unique (%)13.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.82
Minimum10
Maximum194
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum10
5-th percentile30
Q149.75
median65
Q395
95-th percentile131.05
Maximum194
Range184
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation32.72229417
Coefficient of variation (CV)0.4493586126
Kurtosis0.2978936607
Mean72.82
Median Absolute Deviation (MAD)26.4182
Skewness0.7446624978
Sum58256
Variance1070.748536
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10. 29.5 30.5 34. 35.5 ... 110.5 129.5 130.5 152. 194. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
60 51 6.4%
 
40 49 6.1%
 
65 44 5.5%
 
50 39 4.9%
 
55 35 4.4%
 
45 33 4.1%
 
70 30 3.8%
 
35 29 3.6%
 
85 27 3.4%
 
80 27 3.4%
 
Other values (95) 436 54.5%
 
ValueCountFrequency (%) 
10 3 0.4%
 
15 4 0.5%
 
20 8 1.0%
 
23 1 0.1%
 
24 2 0.2%
 
ValueCountFrequency (%) 
194 1 0.1%
 
180 3 0.4%
 
175 1 0.1%
 
170 3 0.4%
 
165 2 0.2%
 

Sp. Def
Real number (ℝ≥0)

Distinct count92
Unique (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.9025
Minimum20
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum20
5-th percentile32.95
Q150
median70
Q390
95-th percentile120
Maximum230
Range210
Interquartile range (IQR)40

Descriptive statistics

Standard deviation27.8289158
Coefficient of variation (CV)0.3870368318
Kurtosis1.628394057
Mean71.9025
Median Absolute Deviation (MAD)22.02348125
Skewness0.8540186115
Sum57522
Variance774.4485544
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 20. 34.5 35.5 39.5 40.5 ... 103.5 105.5 121.5 157. 230. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
80 52 6.5%
 
50 50 6.2%
 
55 47 5.9%
 
65 44 5.5%
 
60 43 5.4%
 
75 40 5.0%
 
70 40 5.0%
 
90 36 4.5%
 
45 35 4.4%
 
85 30 3.8%
 
Other values (82) 383 47.9%
 
ValueCountFrequency (%) 
20 6 0.8%
 
23 1 0.1%
 
25 11 1.4%
 
30 20 2.5%
 
31 1 0.1%
 
ValueCountFrequency (%) 
230 1 0.1%
 
200 1 0.1%
 
160 2 0.2%
 
154 3 0.4%
 
150 7 0.9%
 

Speed
Real number (ℝ≥0)

Distinct count108
Unique (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.2775
Minimum5
Maximum180
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum5
5-th percentile25
Q145
median65
Q390
95-th percentile115
Maximum180
Range175
Interquartile range (IQR)45

Descriptive statistics

Standard deviation29.06047372
Coefficient of variation (CV)0.4256229903
Kurtosis-0.2364366728
Mean68.2775
Median Absolute Deviation (MAD)23.90915
Skewness0.3579332951
Sum54622
Variance844.5111327
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 12.5 29.5 30.5 34.5 ... 110.5 114.5 115.5 132.5 180. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50 46 5.8%
 
60 44 5.5%
 
70 37 4.6%
 
65 36 4.5%
 
30 35 4.4%
 
80 33 4.1%
 
40 32 4.0%
 
90 31 3.9%
 
100 31 3.9%
 
55 30 3.8%
 
Other values (98) 445 55.6%
 
ValueCountFrequency (%) 
5 2 0.2%
 
10 3 0.4%
 
15 9 1.1%
 
20 15 1.9%
 
22 1 0.1%
 
ValueCountFrequency (%) 
180 1 0.1%
 
160 1 0.1%
 
150 4 0.5%
 
145 3 0.4%
 
140 2 0.2%
 

Generation
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count6
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.32375
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size6.4 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6612904
Coefficient of variation (CV)0.4998241145
Kurtosis-1.239575758
Mean3.32375
Median Absolute Deviation (MAD)1.44465
Skewness0.01425810028
Sum2659
Variance2.759885795
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1. 1.5 2.5 6. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 166 20.8%
 
5 165 20.6%
 
3 160 20.0%
 
4 121 15.1%
 
2 106 13.2%
 
6 82 10.2%
 
ValueCountFrequency (%) 
1 166 20.8%
 
2 106 13.2%
 
3 160 20.0%
 
4 121 15.1%
 
5 165 20.6%
 
ValueCountFrequency (%) 
6 82 10.2%
 
5 165 20.6%
 
4 121 15.1%
 
3 160 20.0%
 
2 106 13.2%
 

Legendary
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size928.0 B
False
735
True
 
65
ValueCountFrequency (%) 
False 735 91.9%
 
True 65 8.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
01BulbasaurGrassPoison3184549496565451False
12IvysaurGrassPoison4056062638080601False
23VenusaurGrassPoison525808283100100801False
33VenusaurMega VenusaurGrassPoison62580100123122120801False
44CharmanderFireNaN3093952436050651False
55CharmeleonFireNaN4055864588065801False
66CharizardFireFlying534788478109851001False
76CharizardMega Charizard XFireDragon63478130111130851001False
86CharizardMega Charizard YFireFlying63478104781591151001False
97SquirtleWaterNaN3144448655064431False

Last rows

#NameType 1Type 2TotalHPAttackDefenseSp. AtkSp. DefSpeedGenerationLegendary
790714NoibatFlyingDragon2454030354540556False
791715NoivernFlyingDragon53585708097801236False
792716XerneasFairyNaN6801261319513198996True
793717YveltalDarkFlying6801261319513198996True
794718Zygarde50% FormeDragonGround6001081001218195956True
795719DiancieRockFairy60050100150100150506True
796719DiancieMega DiancieRockFairy700501601101601101106True
797720HoopaHoopa ConfinedPsychicGhost6008011060150130706True
798720HoopaHoopa UnboundPsychicDark6808016060170130806True
799721VolcanionFireWater6008011012013090706True